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Abstract. The uniqueness of sparsest solutions of underdetermined linear systems plays a fun- 
damental role in the newly developed compressed sensing theory. Several new algebraic concepts, 
including the sub-mutual coherence, scaled mutual coherence, coherence rank, and sub-coherence 
rank, are introduced in this paper in order to develop new and improved sufficient conditions for 
the uniqueness of sparsest solutions. The coherence rank of a matrix with normalized columns is 
the maximum number of absolute entries in a row of its Gram matrix that are equal to the mutual 
coherence. The main result of this paper claims that when the coherence rank of a matrix is low, 
the mutual-coherence-based uniqueness conditions for the sparsest solution of a linear system can 
be improved. Furthermore, we prove that the Babel-function-based uniqueness can be also im- 
proved by the so-called sub-Babel function. Moreover, we show that the scaled-coherence-based 
uniqueness conditions can be developed, and that the right-hand-side vector b of a linear system, 
the support overlap of solutions, the orthogonal matrix out of the singular value decomposition of 
a matrix, and the range property of a transposed matrix can be also integrated into the criteria 
for the uniqueness of the sparsest solution of an underdetermined linear system. 
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1 Introduction 



Consider an underdetermined system of linear equations 

Ax = b, 

where A is a given mxn matrix with m < n, and b G R m is a given vector. Throughout this paper, 
we assume that A has at least two rows, i.e., m > 2. Since the system is underdetermined, it has 
infinitely many solutions. Seeking for the sparsest solution of an underdetermined linear system 
has recently become an important and common request in many applications such as signal and 
image processing, compressed sensing, computer vision, statistical and financial model selections, 
and machine learning (see e.g., [2, 8, 18, 31, 21] and the references therein). Let ||x||o denote the 
number of nonzero components of the vector x 6 R n . Then finding a sparsest solution of a linear 
system amounts to the so-called £o- mm i m i za tio n problem 

min{||x||o : Ax = b}, 

which is known to be NP-hard [29, 1]. An intensive study of this problem has been carried out 
over the past few years (see e.g., [10, 14, 18, 31, 21]), and continues its growth in both theory 
and computational methods that stimulate further cross-disciplinary applications (see e.g., [9, 28, 
30, 21]). However, the understanding of £o-P r °blems, from theory to computational methods, 
remains very incomplete at the moment [8, 21]. For instance, the fundamental question of when 
an £o-P r °blem admits a unique solution has not yet addressed completely, and many existing 
uniqueness claims remain restrictive so far. The main purpose of this paper is to establish some 
new and improved sufficient conditions for a linear system to have a unique sparsest solution. 

The uniqueness of sparsest solutions of underdetermined linear systems is key to the newly de- 
veloped compressed sensing theory, leading to a significant impact on the sparse signal and image 
processing [10, 14, 21]. So far, sufficient conditions for the uniqueness of sparsest solutions have 
been developed largely via such matrix properties as unique representation property [24], spark 
[15], mutual coherence [17], restricted isometry property (RIP) [11], null space property (NSP) 
[13, 35], exact recovery condition [32], range property of A T [36], and the verifiable conditions 
[26, 27]. A crucial property for the study of uniqueness is the spark, denoted by Spark(A), which 
is the smallest number of columns of the matrix A that are linearly dependent. The spark provides 
the guaranteed uniqueness of sparsest solutions, as shown by the result below. 

Theorem 1.1 ([15]). If a linear system Ax = b has a solution x satisfying \\x\\o < Spark(A)/2, 
then x is the unique sparsest solution to the system. 

The spark is difficult to compute. Any computable lower bound < 4>{A) < Spark(^4), 
however, produces a checkable sufficient condition for the uniqueness, such as ||x||o < <j)(A)/2. 
The mutual coherence of a matrix (see the definition in section 2), denoted by fi(A), is such a 
property (e.g., [17, 19, 15, 25, 22, 32]) that yields a computable lower bound of the spark as follows 

1 + <Spark(A), (1) 

which, together with Theorem 1.1, implies the following uniqueness claim. 
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Theorem 1.2 ([17, 23, 19]). // a linear system Ax = b has a solution x obeying 



x\\o < 



( 1 + ot) /2 



(2) 



then x is the unique sparsest solution to the system. 

The condition (2) is restrictive in many cases. In [32], the Babel function, denoted by /J,i(p), 
is introduced and shown to satisfy that Spark(A) > mxa{p : — 1) > 1} > 1 + 1/^, yielding 
the following stronger uniqueness condition than (2). 

Theorem 1.3 ([32]). // a linear system Ax = b has a solution x obeying 



then x is the unique sparsest solution to the system. 

Theorems 1.2 and 1.3 are valid for general matrices. When A = [3> is a concatenation 
of two orthogonal matrices, Elad and Bruckstein [19] have shown that (2) can be improved to 
||x||o < l//i(A), and when A consists of J concatenated orthogonal bases, Gribonval and Nielsen 
[25] have shown that the uniqueness condition can be stated as ||x||o < \ (l + j~f) /M-^)- For a 
general matrix A, however, it remains important, from a mathematical point of view, to address 
the question: How can the bounds (2) and (3) be improved? 

In this paper, we answer this question through the classic Brauder's Theorem. To this end, 
we extract and use more properties of a matrix than the mutual coherence, including the sub- 
mutual coherence, which is the second largest inner product between two columns of a matrix with 
normalized columns, and the so-called coherence rank that turns out to be an important property 
for the uniqueness of sparsest solutions. The sub-Babel function is also taken into account in 
order to enhance the result of Theorem 1.3 above. One of our main results in this paper claims 
that for a general matrix A, when the coherence rank of A is smaller than l/fj>(A), the lower 
bound (1) of Spark(A), and thus the condition (2), can be improved. 

Note that the spark of a matrix is invariant under nonsingular scalings (see section 4 for 
details), but the mutual coherence is not. This suggests a space for a further improvement of 
the coherence-based uniqueness conditions. Thus we introduce the concept of the scaled mutual 
coherence in section 4, which enables us to establish a general coherence-based uniqueness con- 
dition, leading to an optimal lower bound of the spark in certain sense. We demonstrate two 
instant applications of the scaled mutual coherence. Note that the existing uniqueness conditions 
use matrix properties only, and the role of b is completely overlooked. The sparsity of a solution, 
however, can also depend on the right-hand-side vector b of a linear system. How to integrate b 
into a uniqueness condition for sparsest solutions is worth addressing (as pointed out by Bruck- 
stein et al. [8]). The first application of the scaled mutual coherence yields such a uniqueness 
condition that depends on the property of A and b altogether. The second application of the 
scaled mutual coherence is a uniqueness criterion for sparsest solutions via the mutual coherence 
of an orthogonal matrix out of the singular value decomposition of A (see section 4.2 for details). 

All the above-mentioned results are developed by identifying a lower bound for the spark of a 
matrix. Any improvement of the spark condition in Theorem 1.1 leads to a further enhancement 



x\\ < - min{p : //i(p - 1) > 1}, 



(3) 
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of these results. Although it is hard to improve Theorem 1.1 in general, it is possible to do 
so in some situations. We show that the support overlap of solutions of a linear system is the 
information that can be used to achieve this goal (see section 5 for details). Finally, we introduce 
certain range properties of a matrix that guarantee a unique sparsest solution to a linear system. 
Similar to the RIP [11, 10] and the NSP [13, 35], the range property arises naturally from the 
analysis of the uniform recovery of sparse signals (as shown in [36]). 

This paper is organized as follows. We introduce several new concepts in section 2, and use 
them to develop improved uniqueness for sparsest solutions. The improvement of Babel- function- 
based condition, and the comparison of the existing conditions with those developed in this paper 
are given in section 3. The scaled-coherence-based uniqueness conditions and their applications 
are discussed in section 4. A further improvement of the spark condition via support overlap 
of solutions is demonstrated in section 5, and the range-property-based uniqueness is briefly 
introduced in section 6. 



2 Improved conditions for uniqueness of sparsest solutions 

Let di,i = 1, ...,n be the columns of A. Recall that the mutual coherence of A (see e.g., [17, 8]) 
is defined as 

I T a I 

fi(A) = max - — 1 3 — — . 

{ +3 \\ a i\\2 ■ hj\\2 

So n(A) is the maximum absolute value of the inner product between the normalized columns 
of A. The lower bound (1) plays a vital role in the development of the uniqueness theory and 
the performance guarantee of such algorithms as (orthogonal) matching pursuit, ^-minimization, 
and iterative thresholding algorithms for the sparsest solution of linear systems (see e.g., [17, 19, 
15, 22, 32, 33, 8, 18, 3, 4]). Any improvement of this lower bound may lead to an enhancement of 
many existing results in this field. In what follows, we develop an improved lower bound for spark 
(A) that leads to an improved sufficient conditions for a linear system to have a unique sparsest 
solution. Let us begin with a few concepts. 

2.1 Sub-mutual coherence, coherence rank, and sub-coherence rank 

Let us sort the different values of the inner product la^ajl/Cll^lhllo/lk) in a descending order, 
and denote them by 

^\A) > n {2) {A) >■■■> ^ k) {A). 
Clearly, the largest one is fi^(A) = fi(A), the mutual coherence. 

Definition 2.1. The sub-mutual coherence of A, /j,^(A), is the second largest absolute inner 
product between two normalized columns of A : 



^ 2 \A) = max \ - M ° ? °f - : - f a > - < n(A) } . 

I \\ a i\\2 ■ hj\\2 \\ a i\\2 ■ \\ a j\\2 



In order to introduce the next useful property of a matrix, let us consider the index set 



Si(A) := \j: j^i, f a j =n(A)\, i = l,...,m. 
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Without loss of generality, we assume that the columns of A are normalized. It is easy to see that 
Si{A) counts the number of absolute entries equal to n(A) in ith row of G = A T A, the Gram 
matrix of A. Clearly, at least one of these sets is nonempty, since the largest absolute entry of G 
is equal to fi(A). Denote the cardinality of Si(A) by (Xi(A), i.e., 

ai(A) = \Si(A)\, i = l,...,m. 

Clearly, < a^A) < n - 1. Let 

a(A) = max cti(A) = max | ^ (A) | , (4) 

l<i<m l<i<m 

which is a positive number. Let io be an index such that 

a(A)=a io (A) = \S io (A)\, 

i.e., the ioth row of G has the maximal number of absolute entries equal to n(A). Then we may 
define 

/3(A) = max a, (A) = max |Si(A)|, (5) 

l<i<m, ij^io l<i<m, ij^io 

which is the second largest number among cti(A),i = 1, ...,m. 

Definition 2.2. a(A), given by (4), is called the coherence rank of A, and /3(A), given by (5), 
is called sub-coherence rank of A. 

For a given matrix A with normalized columns, both a(A) and (3(A) can be easily obtained 
through its Gram matrix G = A T A or its absolute Gram matrix, denoted by abs(G). By the 
definition of fJ-(A), there exists at least one off-diagonal absolute entry of A, say \Gij\ (in ith row), 
which is equal to fJ-(A). By the symmetry of G, we also have \Gji\ = n(A) (in jth row of G). Thus 
the symmetry of G implies that (3(A) > 1. So, for any matrix A, we have the relation 

1 < (3(A) < a(A). (6) 

Geometrically, a(A) can be called the Equiangle of A in the sense that it is the maximum 
number of columns of A that have the same largest angle with respect to a column, say the 
io-column, of A. 

Remark 2.3. When all columns of A are generated by a single vector, then n(A) = 1 and 
a(A) = (3(A) = n — 1. When A has at least two independent columns (not all columns are 
generated by a single vector), then a (A) < n — 1. For the concatenation of two orthogonal bases 
A = <£], where ^,<1? are m x m orthogonal matrices, we see that a(A) < n/2 = m. As we 
have pointed out, all fi(A) , (A) , a(A) and (3(A) can be obtained straightaway from the Gram 
matrix of A. For example, when A is given by 



A 



-0.9802 0.1 0.3521 0.9239 0.9239 0.7405 
-1.8282 1.0365 0.3827 -0.3827 -1.6821 
0.3269 1.3563 -0.2949 



(7) 
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then the Gram matrix of the normalized A 



G 



1 

-0.4668 
-0.4908 
-0.7644 
-0.0981 
0.5763 



-0.4668 
1 

0.2020 
0.9239 
0.9239 
0.3978 



-0.4908 
0.2020 
1 

0.4142 
-0.0409 
-0.5803 



-0.7644 
0.9239 
0.4142 
1 

0.7071 
0.0217 



-0.0981 
0.9239 

-0.0409 
0.7071 
1 

0.7134 



0.5763 
0.3978 
-0.5803 
0.0217 
0.7134 
1 



from which we see that n(A) = 0.9239 > ^ 2) {A) = 0.7644, and a(A) = 2 > /3(A) = 1. 
2.2 Coherence-rank-based lower bounds for Spark(A) 

Let us first recall the Brauer's theorem [7] (see also Theorem 2.3 in [34]), concerning the estimate 
of eigenvalues of a matrix. Let a{A) := {A : A is an eigenvalue of A} be the spectrum of A. 

Theorem 2.4 (Brauer [7]). Let A = (a^) be an N x N matrix with N > 2. Then, if X is an 
eigenvalue of A, there is a pair (r,q) of positive integers with r / q (1 < r, q < N) such that 

N 

|A — a rr \ • |A — a qq \ < A r A q , where Aj := ^ \aij\ for 1 < i < N. 

.7=1 J 7^ 

Hence if Kij(A) = {z : \z — au\ ■ \z — ajj\ < AjAj} for i / j, then a (A) C Ui^y Kij(A)- 

We make use of this classic theorem to prove the following result, which turns out to be an 
improved version of (1) when the coherence rank is low. 

Theorem 2.5. Let A 6 R mxn be a matrix with m < n, and let a(A) and /3(A) be defined by 
(4) and (5), respectively. Suppose that one of the following conditions holds: (i) a(A) < j^j', (H) 
a(A) < and /3(A) < a(A). Then ^ {2) {A) > and 

2[l-a(AmA)]l(Af] 



Spark(A) > 1 + 



M (2)(A) [ii(A)(a(A) + P{A)) + v /[/I(A)HA)-/3(A))] 2 + 4} ' 



(8) 



where fj,(A) := fx(A) - ^ 2) {A). 

Proof. Normalizing the columns of a matrix does not affect any of the Spark(A), fJ,(A), fi^ 2 \A), 
a (A) and /3(A). Thus, without loss of generality, we assume that all columns of A have unit £2- 
norms. Let p = Spark(A). By the definition of spark, there exist p columns of A that are linearly 
dependent. Let As be the submatrix consisting of these p columns. Without lost of generality, we 
assume As = (ai,a2, ■■■,a p ). Thus the p x p matrix Gss = ^\sAs is singular, since the columns 
of As are linearly dependent. Note that all diagonal entries of Gss are equal to 1, and all off- 
diagonal absolute entries are less than or equal to n{A). Under either condition (i) or (ii), we have 
a(A) < j^j- Hence it follows from (1) that 



1 + a(A) < 1 + 



»(A) 



< Spark(A). 



So, a (A) < Spark(A) — 1 = p — 1. Note that Gss is a p x p matrix. Thus in every row of Gss, 
there exist at most a (A) absolute entries equal to /J,(A), and the remaining (p— 1) — a(A) absolute 
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entries are less than or equal to ^ 2 \A). By the singularity of Gss-, A = is an eigenvalue of Gss- 
Note that the entries of Gss are given by Gij = afaj where i,j = l,...,p. Thus by Theorem 2.4, 
there exist two different rows, say ith and jth rows (i / j), such that 



|0 - Gu\ ■ |0 - Gjj\ < AjAj = ( E \ a i a k\] \ E l°J a fcl 



(9) 



where Gu = Gjj = 1 are two diagonal entries of Gss- By the definition of a(A) and /3(A), one 
of these two rows contains at most a(A) entries with absolute values equal to fi(A), and the next 
row contains at most f3(A) entries with absolute values equal to n(A). The remaining entries in 
these rows are less than or equal to p^ 2 \A). Therefore, 



E l^ofcl E \ a J a k\] < \a{A)^A) + {p-l-a{A))^\A)}-\p(A)^A) 



^k=l,k^=i 



K k=l,k^j 



+ {p-l-P{A))^\A) 



(10) 



Combining (9) and (10) leads to 



I a{AMA) + (p - 1 - a(A))^(A)] ■ [/3(A)p(A) + (p - 1 - P(A))^(A) 

a(A)J,(A) + (p- 1)^ 2 \A)] - [0(A)ji(A) + (p-l)^(A) 

where p(A) := p(A) — ^ 2 \A). By rearranging terms, the inequality above can be written as 

(p- l)^ 2 \A)] 2 + [(p-l)pW(A)] (a(A)+p(A))p(A) + a(A)P(A)MA) 2 - 1 > 0. (11) 

We now show that ,(/ 2 )(A) / 0. In fact, if ^(A) = 0, then (11) is reduced to a(A)/3(A)fi(A) 2 > 1, 
which contradicts both conditions (i) and (ii). In fact, each of conditions (i) and (ii) implies that 
a(A)/3(A)n(A) 2 < 1. Thus fi^(A) is positive. Note that the quadratic equation (in t) 



t 2 + t{a{A) + P(A))p(A) + a(A)f3(A)jl(A) 2 
has only one positive root. So it follows from (11) that 



1 = 



(p-l)pW(A) 

(a(A) + f3(A)MA) + ^[p(A)(a(A) + (3(A))} 2 - 4(a(A)f3(A)J,(A) 2 - 1) 



> 



-(a(A) + (3(A))Jl(A) + y [(q(A) - P(A))p(A)Y + 4 

2 

2 [1 - a(A)0(A)/i(A) 2 ] 



(a(A) + (3(A)MA) + \J[(a(A) - f3{A))p(A)\ 2 + 4 

which is exactly the relation (8). □ 

The next proposition shows that (8) is an improved lower bound for Spark (A) under the 
condition of Theorem 2.5. 



7 



Proposition 2.6. Let * (a(A), /3(A), fi(A), ^ (A)) denote the right-hand side of the in- 
equality (8). When a(A) < j^jj, we have 



When a(A) < and (3(A) < a(A), we have 

*(aiA),KA)MA),MAj) > (l + ^) + - ^) (1 - a(A),(A)) 

a(A)MAf 



V 2 )(A)(l + a(A)/;(A))' 

where jil(A) = fi(A) - /x (2) (A). 

Proof. By using the fact \J a? + b 2 < a + b for any a, b > 0, we have 

y(a(A),f3(A),v(A),^(A)) -1 

2(l-a(A)/3(A)£(A) 2 ) 



(12) 



M ( 2 )(A) |£(A)(a(A) + /3(A)) + y/\ji(A)( a (A) - /3(A))] 2 + 4 

> 2(l- a (A)/3(A)/i(A) 2 ) 

" /i( 2 )(A) {/I(A)(«(A) + /3(A)) + [m(A)(«(A) - /3(A))] + 2} 
1 - «(A)/3(A)^(A) 2 
M ( 2 )(A)(l + a(A)/I(A))- 

Case 1: a(A) < y^y- In this case, by (6), i.e., /3(A) < a(A), it follows from (12) that 

1 - q(A)//(A) 
^( 2 )(A) 

= ^ + (^-^4)) (1 " a( ^ (A)) - 

Case 2: a(A) < and /3(A) < a(A). In this case, since /3(A) < a(A) — 1, it follows again 
from (12) that 

l-a(A)£(A) a(A)/I(A) 2 



/x( 2 )(A) ^ M ( 2 )(A)(l + a(A)/J(A)) 

^ + (^-^4)) (1 - a( ^ )} 
a(A)Jl(A) 2 



+ ^)(A)(l + a(A)]l(A)) 
as desired. □ 
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Under the first case above, we see that 

{^--^A)) il - a(A ^ A))>0 - 
and under the second case, we have 

/I 1 \ / , t s , a(A)u(A) 2 

Thus the next corollary follows immediately from Proposition 2.6. 
Corollary 2.7. Under the condition of Theorem 2.5, we have 

^( a (A),P(A)^(A),^\A))>l + -L- y 

Therefore, the lower bound of spark given by (8) does improve the bound (1) when the coher- 
ence rank, a(A), is small. Proposition 2.6 also indicates explicitly how much this improvement 
can be made at least. 

If the Gram matrix G of the normalized A has two rows containing a(A) entries with absolute 
values equal to n(A), then a(A) = (3(A), in which case the lower bound (8) can be simplified to 

* ( a (AmA) M A),^(A)) - (l + ^) + - ^) (1 - *(AMAy, 

Note that G has at most one absolute entry equal to n(A) in its every row if and only if a(A) = 
(3(A) = 1. In this special case, the condition a(A) < 1/ fi(A) holds trivially when fJ,(A) < 1. Thus, 
the next corollary follows immediately from Theorem 2.5. 

Corollary 2.8. Let A G R mxn be a matrix with m < n. If fi(A) < 1 and a(A) = 1, then 
^ 2) {A) > 0, and 

s ^ ) > 1 + ^ + (_^__i_) (1 _^ )) . 



Although Corollary 2.8 deals with a special case from a mathematical point of view, many 
matrices satisfy the property a(A) = 1 together with fi(A) < 1. Numerical experiments show that 
when a matrix is randomly generated, the coherence rank of the matrix is most likely to be 1. In 
fact, the case a(A) > 2 arises only when A has at least two columns, each of which has the same 
angle to a column of the matrix, and such an angle is the largest one between a pair of columns 
of A. This phenomenon indicates that the coherence rank of a matrix is usually low in practice, 
typically a(A) = 1. 

2.3 Uniqueness via coherence and coherence rank 

Consider the class of matrices 

A4 = G R mxn : either a(A) < — and /3(A) < a(A), or a(A) < — Lyj 

= M 1 UM 2 , (13) 
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where 



Mi = |a G R rnxn : a{A) < — L- J , M 2 = {a G i? mxn : a(A) < and /3(A) < a(A)J . 

We now state the main uniqueness claim of this section. 

Theorem 2.9. Let A G .M, defined by (13). If the system Ax = b has a solution x obeying 



kilo < 



i + 



2{l-a{A)l3{A)Ji{Af 



^){A) \ Jl(A)(a(A) + fl{A)) + y/[fl(A)(a(A)- /3(A))] 2 + 4 



(14) 



where /x(A) := /i(A) — //( 2 )(A), i/ien x is i/ie unique sparsest solution to the linear system. 



This result follows instantly from Theorems 2.5 and 1.1. As shown by Proposition 2.6, condi- 
tion (14) has improved the well-known condition (2) when A is in class M. This improvement is 
achieved by using the sub-mutual coherence ^ 2 \A) together with (sub-)coherence rank, instead 
of n(A) only. Note that a(A), (3(A), fj,(A) and // 2 )(A) can be obtained straightforward from the 
Gram matrix G = A T A. Thus the bound (14) can be easily computed. 



By Theorem 2.5 and Proposition 2.6, we obtain the next result. 

Theorem 2.10. (i) Let A G Mi, defined by (13). If the system Ax 
obeying 



b has a solution x 



kilo < g 



1 + 



1 



+ 



1 



(1 - a(A)n(A)) 



M (A) ' \^)(A) n{A), 

then x is the unique sparsest solution of the linear system. 

(ii) Let A G M2, defined by (13). If the system Ax = b has a solution x obeying 



(15) 



kilo < 



1 



1 + 



1 



+ 



M(A) Vm (2) (A) MA) 



(1 - a(A)MA)) + 



a(A)Ji{Af 



^)(A)(l + a(A)Jl(A)) 



(16) 



then x is the unique sparsest solution of the linear system. 

(hi) Let A be a matrix with /u(A) < 1 and a(A) = 1. Then the solution of Ax = b obeying 



kilo < 



1 



1 + 



1 



+ 



1 



1 



//(A) VM 2 )(A) M (A) 



(1 - n(A)) 



(17) 



is the unique sparsest solution of the linear system. 



Result (iii) of the above theorem shows that for coherence-rank- 1 matrices, the uniqueness 
criterion (2) can be always improved to (17). As we have pointed out, matrices (especially the 
randomly generated ones) are largely coherence-rank- 1, unless the matrix is particularly designed. 

Example 2.11. Consider a randomly generated A below and the absolute Gram matrix of 
its column-normalized counterpart 



A 



0.0010 -0.7998 -0.6002 0.0717 
0.8001 -0.3558 0.4798 -0.1913 
0.5999 0.4801 -0.6398 -0.6412 



, abs(G) 



1 0.0025 0.0005 0.7989 

0.0025 1 0.0022 0.4422 

0.005 0.0022 1 0.4093 

0.7989 0.4422 0.4093 1 
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From abs(G), we see that a(A) = (3(A) = 1, n(A) = 0.7989, and ^ 2 \A) = 0.4422. Note that 
Spark(^4)/2 = 2 for this example. The standard mutual bound (2) is (1 + ^4j)/2 = 1.1258, which 
is improved to 1.2274 by (17). 



3 Improvement of Babel-function-based uniqueness 

Let A G R mxn be a matrix with normalized columns. Tropp [32] introduced the so-called Babel- 
function defined as 

m(q) = max max^ \aj aj \ 

where a k , k = 1, n, are the columns of A, and A is some subset of {1, n}. By this function, 
the following lower bound for spark is obtained (see [32]): 

Spark(A) > min {q : m(q - 1) > 1}. (18) 

l<q<n 

The Babel function can be equivalently defined/computed in terms of the Gram matrix G = A T A. 
In fact, sorting every row of abs(G) in descending order yields the matrix G = (Gij) with the first 
column equal to the vector of ones, consisting of the diagonal entries of G. Therefore, as pointed 
out in [18], the Babel function can be written as 

q+l q+1 

m (q) = max ^ \G kj \ = ^ | G koj | , (19) 
i<k<m j=2 j=2 

where ko denotes an index such that the above maximum is achieved. Since fii(q— 1) < (q— l)n(A), 
it is evident that 

min {q : fj,±(q — 1) > 1} > 1 + 



i<g<™ n(A) 
So the lower bond given by (18) is an enhanced version of (1). Some immediate questions arise: 
Can we compare the lower bounds (18) and (8)? Can the lower bounds (18) and (8) be further 
improved? 

We first address the second question above, by showing that the Babel-function-based bound 
(18) can be further improved by using the so-called sub-Babel function. Again, Brauer's Theorem 
plays a fundamental role in deriving such an enhanced result. The sub-Babel function, denoted 
by ii{\q), is defined as 

/4 2) (?)= max J2\G kj \, (20) 

l<k<m,k^ko 

where ko is determined in (19). Clearly, we have 

^i\q) < Vi(q) for any 1 < q < n - 1. (21) 

We have the following improved version of (18). 

Theorem 3.1. For any matrix A £ R mxn , we have 

Spark(A) > min \q : m(q - 1) • nf\q - 1) > l) . (22) 

l<q<n y- > 
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Proof. Let p = Spark(A). Then there exist p columns of A that are linearly dependent. 
Without lost of generality, we assume As = (a±, 0,2, a p ) is the submatrix consisting of these 
p columns. Since the columns of As are linearly dependent and normalized, the p x p matrix 
Gss = A'gAs is singular, and all diagonal entries of Gss are equal to 1. Thus by Theorem 2.4 
(Brauer's Theorem), for any eigenvalue A of Gss, there exist two different rows, say ith and jth 
rows (i / j), such that 

|A - G u \ • |A - Gjj\ < AjAj = I £ \aja k \ ] I £ \ a J a k\ I > ( 23 ) 

where Gu = Gjj = 1 are two diagonal entries of Gss- By the definition of Babel and sub-Babel 
functions, we see that 

maxjA^Aj} < m(p- 1), min^A^} < ^\p- 1). 

Thus it follows from (23) that 

(A - l) 2 < AiAj = max{Ai, A^} • min{Ai, Aj} <m(p-l)- nf\p - 1). 

In particular, since A = is an eigenvalue of Gss, we have 

m (p-l)-/4 2) (p-l)>l. (24) 

So p = Spark( J 4) implies that p must satisfy (24). Therefore, 

Spark(A) =p> mm {q : ^(q - 1) • ^\q - 1) > l} , 

l<q<n { > 

as desired. □. 

The next proposition shows that the lower bound (22) is an improved version of (18). 
Proposition 3.2. Denote by 

mm [q : ^(q - 1) • ^{q - 1) > l) , q = min {q : m(q-V)>l}. 

l<a<n 1 > Ka<n 



l<q<n { J 1<9<« 

1 



Then q* > q. In particular, if ^i\q — 1) < - ^ ^ , then q* > q. 

Proof. By the definition of q*, we see that {X\{q* — 1) • ^\q* — 1) > 1. This, together with 
(21), implies that m(q* - 1) > 1. Thus 

q* > min {q : m(q - 1) > 1} = q. 

l<q<n 

We now further show that this inequality holds strictly when the value of the sub-Babel function 
are relatively small in the sense that ^i\q— 1) < - ^zjy - I n f ac t> under this condition, we have 

fn(q-l)-^\q-l) <1. 
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(2) 

Note that both Hi(q— 1) and fi\ (q — 1) are increasing functions in q. The inequality above shows 

(2) ^ 

that when /j,i(q — 1) ■ fj\ '(q — 1) > 1, we must have q > q. Therefore, 

q* = min {q : m(q - 1) • fif\q - 1) > 1} > q, 

l<i<n 

which shows that (22) improves (18) for this case. □. 

The next proposition indicates that when the coherence rank of A is relatively small, bound 
(22) is also an improved version of (8). 

Proposition 3.3. Let A G R mxn be a given matrix. Let q* be defined as in Proposition 3.2. 
If a(A) < 1///(A) and a{A) < q* - 1, then 

2[l-a(A)/3(A^(A) 2 ] 



q* > 1 + 



^){A) [hA)HA) + /3(A)) + J\ftA)(a(A)-ftA))] 2 + 4} ' 



where /j,(A) := /j,(A) - ,u (2) (A). 

Proof. Since a(A) < q* - 1, by the definition of a(A) and /3(A), it follows from (19) and (20) 
that 

Mi(<Z* " 1) < + (q* - 1 - a(A)y 2 )(A), 

a4V - 1) < |9(A) M (A) + (<f - 1 - /3(A))^ 2 )(A). 
These relations, together with the definition of q*, imply that 

1 < m (g*-l)- M SV-l) 



< 



O: 



(A)n(A) + (?*-!- a(A))/z( 2 )(A)l • [/3(A),z(A) + (g* - 1 - /3(A))^ 2 )(A) 



Thus we obtain the same inequality as (11) with p replaced by q*. Following (11), and repeating 
the same proof therein, we deduce that 

q * > 1 + 2[1- a(A)/3(A)^A) 2 ] 



^){A) {/x(A)(a(A) + /3(A)) + J\ji(A)(a(A)- /3(A))] 2 + 4} ' 

where /5(A) := //(A) - /U (2) (A). □. 

It is also worth briefly comparing the Babel-function-based bound (18) and those developed 
in section 2 of this paper. At a first glance, it seems that (18) is more sophisticated than those 
developed in section 2. However, two types of bounds are mutually independent in the sense 
that one cannot definitely dominate the other in general. For example, when a{A) < q — 1 and 
a(A) < 1/ /j,(A) where q is defined in Proposition 3.2, we have 

1 < Vi(q — 1) = max V \G kj \ < a(A)n(A) + (q-l- a(A))// 2 )(A). 

l<k<m J n 
J=2 



Thus, 

1 - 

_ M (2)(A) " ^ ' n{A)) ' V^)(A) /i(A) 



</>! + 1 -fl--^) I" 7^777 - 77T7 : ] ( 1 - n ( .!)„( A) ) . 
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In this case, the Babel-function-based bound (18) is tighter than bound (15). However, when 
q — 1 < a(A), the relationship between the bounds (15) and (18) can be complicated. The 
bound (15) and the one in Theorem 2.9 might be tighter than (18). Indeed, let us assume that 
q — 1 < a(A) < p — 1 where p = Spark(^4), and a(A) < 1/ [i(A). Then (15) indicates that 



P 



+ t* 



for some integer t* > 0. This can be written as 

1 + f-(j>- ?). 



1 + Ml) + (^5T(I)-^4)) (1 - c>( ' 4) '' ( - 4)) 

If t* < p — q, then the above inequality implies that 

fS1 + ^ + (^-Ml)) (1 - a( - 4> ' ,( ^ 

By Proposition 2.6, the right-hand side of the above is dominated by ^f(a(A), (3(A), /i(A), ^ {A)). 
Therefore, as a lower bound of spark, (8) is tighter than (18) in this case. 

4 Scaled mutual coherence 

The mutual coherence is not only an important property for the development of the uniqueness of 
sparsest solutions, but also crucial for the performance guarantee and stability analysis for many 
sparsity-seeking algorithms, such as basis pursuit, orthogonal matching pursuit, and thresholding 
algorithms (see e.g., [12, 19, 15, 22, 32, 33, 16, 8, 18, 21]). The Babel function [15, 32], fusion 
coherence [6] and block coherence [20] are several variants of the mutual coherence. In this section, 
we introduce the scaled mutual coherence, which may lead to an optimal coherence-based estimate 
of the spark in certain sense. In theory, the improved results established in previous sections can 
be either extended or further improved by choosing a suitable scaling matrix. 

4.1 Uniqueness via the scaled mutual coherence 

Note that Spark(A), where A G R rnxn with m < n, is invariant under a nonsingular linear 
transformation in the sense that 

Spark(A) = Spaik(WA) 
for any nonsingular matrix W 6 ^ mxm . However, the mutual coherence fi(A) is not. That is, 

H{A) + ixiyVA) 

in general (see Examples 4.4 and 4.5 in this section). Thus the improved conditions (14) -(17) 
still have a room for a further improvement by using a suitable nonsingular scaling W. Motivated 
by this observation, we consider the weighted inner product between every pair of columns of a 
matrix, and define 

Hw(A) = max ..'; ,; — J . = fi(WA). 
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Similarly, we define 

ifr y ||Vvaj||2 • || VV aj\\2 \\ Wai\\ 2 ■ \\Waj\\2 J 

In this paper, /j,w{A) and fJ${A) are referred to be as the scaled mutual coherence and the scaled 
sub-mutual coherence, respectively. It makes sense to introduce the next definition. 

Definition 4.1. Let 

fj,*(A) := min{fi w (A) : W G R mxm i s nonsingular] . 

H*(A) is called the optimal scaled mutual coherence (OSMC) of A. 

By definition, we have /J,*(A) < nw(A) for any nonsingular W 6 R mxm an( j an y A G R mxn . 
In particular, by setting W = I (the identity matrix), we see that fi*(A) < n{A) for any A. As 
shown by the next result, the OSMC provides a theoretical lower bound for the spark that is 
better than any other scaled-mutual-coherence-based bound. 

Theorem 4.2. For any m x n (m < n) matrix A with nonzero columns, we have fi*(A) > 0, 
and 

1 + — < 1 + — — < Spark{A) 

H W {A) ii*(A) 

for any nonsingular matrix W G R mxm . Hence if the system Ax = b has a solution satisfying 

»* £ ( 1 + ^P))/ 2 ' 

or more restrictively, if there is a nonsingular matrix W such that \\x\\q < (1 + \/nw{A)) /2, then 
x is the unique sparsest solution to the linear system. 

Proof. Let W be an arbitrary nonsingular matrix. We consider the scaled matrix W A. Let 
D = diag(l/||W / ai||2, l/||Wa n ||2) where aj,z = l,...,n are the columns of A. Then WAD is 
a matrix with normalized columns. Clearly, this normalization does not change the spark (and 
hence, spark(WAD) = spaik(WA) = spark(^).) We also note that W AD{D^ 1 x) = Wb and 
Ax = b have the same sparsity of solutions. So without loss of generality, we assume that all 
columns of W A have unit ^-norms. Let p = Spark(^4). By definition, there exist p columns of A 
that are linearly dependent. Let consist of these p columns. Then the matrix 

:= (WA s ) T (WAs) = A T S W T WA S 

is a p x p singular matrix due to the linear dependence of columns of As- Since WA is normalized, 
all diagonal entries of are equal to 1, and off-diagonal entries are less than or equal to 

fxw{A). By the singularity of G$^\ this matrix has a zero eigenvalue. Thus by Gerschgorin's 
theorem, there exists a row of the matrix, say the ith row, such that 

i<Ei( G S?y<(p-iw(^), 

which implied that 

H W {A) > l/(p - 1) > 0. 
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Note that the inequality above holds for any nonsingular matrix W £ R mxm . Taking the minimum 
value of the left-hand side yields fi*(A) > l/(p — 1) > 0, and thus 

P (=Spa r M))>l + -l iy . 

The right-hand side of the inequality above is greater than or equal 1 + 1/ Hw (A) for any nonsin- 
gular W, since /j,\y(A) < fi*(A). The uniqueness of sparsest solutions of the linear system Ax = b 
follows immediately from Theorem 1.1. □. 

With a scaling matrix W, we denote the scaled coherence rank and scaled sub-coherence rank 
by aw (A) = a{WA) and fiw(A) = /3(WA), respectively. By applying the same proof of Theorem 
2.5 to the scaled matrix WA, the lower bound of spark, together with uniqueness conditions for 

(2) 

sparsest solutions in section 2, can be stated in terms of pw(A), /j, w ' (A), aw {A) and /3w\A). First, 
we define a class of matrices as follows. 



M 



[A £ R mxn 

either aw (A) < 



: there is a nonsingular W £ R 
1 



such that 



Hw{A) 



and Pw(A) < aw (A), or aw (A) < 



Hw{A) 



}. (25) 



We now state the counterpart of Theorems 2.9 and 2.10, and Corollary 2.12 via the scaled 
coherence and the scaled coherence rank. 

Theorem 4.3. (i) Let W £ R mxm he a nonsingular matrix such that A £ Ai, defined by 
(25). If the system Ax = b has a solution x such that 



1 



2(l-a w (A)(3w(A)ilw(A) 2 ) 



^ W \A) \ jL W {A){a w (A) + Pw(A)) + J[Jiw{A){aw(A) - Pw(A))\ z + 4 



kilo < 



where (iw(A) := fJ-w(A) — (x w \A), then x is the unique sparsest solution to the linear system. In 
particular, the conclusion is valid if the following condition holds 



kilo < g 



1 + 



+ 



Vw{A) \fJ$(A) Vw{A) t 



(1 - aw(A)n w (A)) 



(ii) Suppose that W £ R mxm is nonsingular such that fiw(A) < 1 and aw(A) = 1. Then if 
the solution x of Ax = b obeys 



kilo < g 



1 + 



1 



+ 



1 



1 



»w(A) ' {^(A) »w(A) 
x is the unique sparsest solution to the linear system. 



(1 - fiw(A)) 



(26) 



The next example shows that Ai C M, i.e., Ai is strictly larger than Ai, and hence Theorem 
4.3 covers a broader class of matrices than its counterparts in section 2.3, and by a suitable 
scaling, the result of Theorem 4.3 can further improve the results in section 2. In fact, when 
a(A) < -jjjj does not hold (in which case A A4), the scaled matrix WA may satisfy the 



condition aw (A) < 



ftW 



(so WA £ Ai, and hence A £ Ai), as shown by the next example. 
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Example 4.4. Consider the matrix (7) given in Remark 2.3. For this matrix, fJ,(A) = 0.9239, 
^(2) (A) = 0.7644, a(A) = 2, (3(A) = 1, and the bound (2) is 1.0498. Note that this matrix does 
not belong to M, since a(A) ^ l/[i(A). So (14)-(17) cannot apply to this matrix. Now, we 
randomly generate a scaling matrix as follows 



W 



-0.9415 -0.5320 -0.4838 
-0.1623 1.6821 -0.7120 
-0.1461 -0.8757 -1.1742 



It is easy to verify that nw(A) = 0.8954, ^{A) = 0.8302, and a w (A) = j3 w (A) 
after this scaling, the absolute Gram matrix of the normalized WA is given by 



1. In fact, 



abs(GW) 



1.0000 
0.3561 
0.7138 
0.8302 
0.3978 
0.8954 



0.3561 
1.0000 
0.5753 
0.8130 
0.7126 
0.0973 



0.7138 
0.5753 
1.0000 
0.8227 
0.0177 
0.4874 



0.8302 
0.8130 
0.8227 
1.0000 
0.1707 
0.4969 



0.3978 
0.7126 
0.0177 
0.1707 
1.0000 
0.7634 



0.8954 
0.0973 
0.4874 
0.4969 
0.7634 
1.0000 



Thus by this scaling, the original coherence rank a(A) = 2 is down to aw (A) = 1, and hence 
Theorems 4.3(h) can apply to WA. Note that the scaled bound (1 + ^^ )/2 in Theorem 4.2 
is 1.0584 and the bound (26) in Theorem 4.3 (ii) is 1.0630. Both improve the original unsealed 
bound (2). This example shows that while A g" M, we have WA G M, and hence A G Ai. 

From simulations, we observe that when the coherence rank of a matrix is high in the sense that 
a(A) > 2, it is quite sensitive to a scaling W, which may immediately reduce a(A) to aw(A) = 1, 
as shown by the above example. When the coherence rank a(A) = 1, it is insensitive to a scaling 
W, and it is highly likely that aw (A) remains 1. 

Example 4.5. Consider A and the absolute Gram matrix abs(G) of its normalized counterpart 



A = 



0.0010 -0.7998 -0.6002 1.4290 
0.8001 -0.3558 0.4798 1.2393 
0.5999 0.4801 -0.6398 -0.6849 



, abs(G) 



1 

0.0025 
0.005 
0.2894 



0.0025 
1 

0.0022 
0.9523 



0.0005 
0.0022 
1 

0.0870 



0.2894 
0.9523 
0.0870 
1 



For this example, a{A) = (3(A) = 1, fi(A) = 0.9523, and ^ 2) {A) = 0.2894. The standard bound 
(2) is (1 + j^)/2 = 1.025, which is improved to 1.0824 by (17). We now use the scaling matrix 



W 



-0.2078 
-0.9381 
0.6702 



0.9393 
0.5715 
0.2228 



0.1905 
0.3268 
0.7662 



which is a randomly generated nonsingular matrix. This scaling matrix yields fiw(A) = 0.8343, 

(2) 

fi A (A) = 0.7272, and aw{A) = (3w{A) = 1. The original bound (2) can be further improved by 
bound (1 + ) /2 = 1.0993, and by bound (26) that is equal to 1.1139. So the scaled bound 

improves the unsealed bound (17). 

The examples above do show that a scaling matrix can change the mutual coherence, and 
may reduce coherence rank as well. By a suitable scaling, we can further improve the mutual- 
coherence-based uniqueness conditions for sparsest solutions of linear systems. Note that if the 
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OSMC is attainable, i.e., there exists a nonsingular W* such that n*(A) = [iw* (A). Then Theorem 
4.3 holds for the OSMC. However, the optimal scaling W* is difficult to obtain in general. Also, 
for a given linear system, which scaling matrix should be used in order to improve the uniqueness 
claims is not obvious in advance. However, the scaled coherence can be viewed as a unified 
method for developing other coherence- type conditions for the uniqueness of sparsest solutions. 
It is worth mentioning that the Babel function can be also generalized to the weighted case, and 
related uniqueness claims can be made as well. 

4.2 Application 

Note that the existing uniqueness claims for sparsest solutions of linear systems are general and 
hold true uniformly for all b. These claims are made largely by using the property of A only, and the 
role of b, which is solution-dependent, has been overlooked. Clearly, the property of the sparsest 
solution is usually dependent on A and b. So it is interesting to incorporate the information b into 
a uniqueness criterion for sparsest solutions. The scaled mutual coherence can be used to achieve 
this goal. Indeed, let <p be a mapping from R m to R+ + (the positive orthant of R m ). Denote 
by <I> n = diag(<^(u)), a nonsingular diagonal matrix with diagonal entries 4>i(u) > 0, i = 1, re- 
setting u = b, we see that the system Ax = b is equivalent to 

(<S> b A)x = <S> b b. (27) 

For instance, we let <p{u) be separable, i.e., <j){u) = {(f)\{u\), n (x n )) T , and we define 

Mt) = i 1 ( t if ^° (28) 
Y w I 1 otherwise. v 1 

By this choice, we have & b b = diag(0(fe))6 = |sign(fe)|. Note that Spark(A) = Spark(3>b.A), and the 
sparsity of solutions of the scaled system (27) is exactly the same as that of Ax = b. However, as 
we have seen before, a scaling matrix may change the mutual coherence, and a suitable scaling may 
improve the mutual-coherence-based uniqueness claims for sparsest solutions of a linear system. 
Through a scaling matrix dependent on b, the contribution of b to the uniqueness of sparsest 
solutions can be demonstrated by the next two corollaries. 

Corollary 4.6. If the system Ax = 6, where A £ R my - n w %th m < n, has a solution satisfying 
||x||o < (l + ^(J^) ) /2, then x is the unique sparsest solution to the linear system. 

Applying to the scaled system (27), this corollary follows from Theorems 4.2 and 1.1 straight- 
away. By Theorem 4.3, this corollary can be improved when the scaled coherence rank a(^bA) is 
relatively small, as indicated by the next result. 

Corollary 4.7. Let A be an m x n matrix with m < n. 

(i) Suppose that either u{$ b A) < and P($ b A) < a($ b A) or a(<f> b (A)) < j^j-y If the 

system Ax = b has a solution x obeying 



Nlo<2- 



2 (1 - a(^ b A)P{^ b A)Ji^ b Af 



/i(2)($ 6J 4) { j5(<M)(a($ 6 A) +m b A)) + J\fi($ b A)(a($ b A) - ^ b A))f + 4 
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where fx(^ b A) := fi(<fr b A) — /j,( 2 \$ b A), then x is the unique sparsest solution to the linear system. 
In particular, the same conclusion holds if x obeys 



M|o<2 



1 + 



+ 



M (<M) V/x( 2 )(<M) 



(1 - a(<M)M<M)) 



(ii) If 4> is chosen such that ii(Q b A) < 1 and a(Q b A) 
satisfying 

i r l 

1+ ^ AS + 



1, then the solution x of Ax = b 



Wlo<2 



1 



1 



M<M) ' \^)($ b A) M (<M) 
is the unique sparsest solution of the linear system. 



:i - m<m)) 



(29) 



The next example shows that when b is involved, the uniqueness claim for sparsest solutions 
can be improved in some situations. 

Example 4.8. Consider the system Ax = b where A is a 3 x 5 matrix given by 



1 

2 
3 



-3 -6 
3 -2 
-2 1 



4 
-2 




-3 
3 
4 



, abs(G) 



1 


0.1709 


0.2922 





0.6875 


0.1709 


1 


0.3330 


0.8581 


0.3656 


0.2922 


0.3330 


1 


0.6984 


0.4285 





0.8581 


0.6984 


1 


0.6903 


0.6875 


0.3656 


0.4285 


0.6903 


1 



where abs(G) is the absolute Gram matrix of the normalized A. From abs(G), we see that n(A) = 
0.8581, ^ {2) {A) = 0.6984, and a{A) = /3(A) = 1. Thus the standard bound (2) is 1.0827, which 
is improved to 1.1016 by (17). In order to see which b can further improve these bounds, let us 
randomly generate a vector b, for instance, b = (3.6159, —3.5189, 2.6954) T . Let <p be given by (28). 
Then the absolute Gram matrix of the scaled matrix § b A with normalized columns is given by 



ahs{G{<$> b A)) 



1.0000 0.3180 0.1608 

0.3180 1.0000 0.2454 

0.1608 0.2454 1.0000 

0.0107 0.8042 0.6784 

0.7833 0.1178 0.4231 



0.0107 0.7833 

0.8042 0.1178 

0.6784 0.4231 

1.0000 0.5928 

0.5928 1.0000 



from which we see that after this 6-involved scaling, the coherence has changed to /j,($ b A) = 
0.8042 and ^ 2 \^ b A) = 0.7833, and the coherence rank remains unchanged. The scaled bound 
(1 + M (J h ^) )/2 = 1.1217 and the scaled bound (29) equal to 1.1250 both improve the unsealed 
bound (2) and (17). 

We now consider another application of scaled mutual coherence. Without loss of generality, 
we assume that A is full-rank. Let A = UT>V T be the singular value decomposition where S is an 
mx m diagonal matrix with singular values as its diagonal entries. We choose the scaling matrix 
M = T I ~ 1 U T which is nonsingular. Then 

Spark(^) = Spark(M^) = Spark(y T ), fi(MA) = fi(V T ). 

This implies that the lower bound of Spark(^4) can be computed by using V , instead of A itself. 
Thus, uniqueness claims for sparsest solutions can be restated by the nonsquare orthogonal matrix 
V T . For completeness, we summarize this result as follows. 
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Corollary 4.9. Let A be an m x n full-rank matrix with m < n, and let A = UT>V T be a 
singular value decomposition where £ is diagonal of singular values . 

(i) // the system Ax = b has a solution satisfying \\x\\q < (l + j^yr^ ) /% then the system has 
a unique sparest solution. 

(ii) Suppose that either a(V T ) < j^rrj and (3(V T ) < a(V T ), or a(V T ) < j^yrj ■ If the system 
Ax = b has a solution x obeying 

( ^ f - (,,TlA/,rTl~/T,T\l\ \ 



M|o<2 



1 + 



2 (l - a(V T )P(V T )ji(V T )*) 



V 



»W(VT) {fi(VT)( a (VT) + l3(VT)) + v /[/I(^)(«(^)-/?(^))] 2 +4 



where fJ.(V T ) := fJ.(V T ) — fi^(V T ), then x is the unique sparsest solution to the linear system. In 
particular, the conclusion is valid if the following condition holds 



1 

kilo < 5 



1 + 



+ 



H{V T ) \fi( 2 )(V T ) fi(V T ) 



(1 - a(V T MV T )) 



(iii) If /j,(V t ) < 1 and a(V T ) = 1, and if a solution x of Ax = b obeys 



kilo < 



1 + 



+ 



(i - kv t )) 



then x is the unique sparsest solution of the linear system. 



Remark 4.10. Consider the sparsest solution of the linear system in matrix form 

TV 

A • x = XiAi = B, 

i=l 

where A G R mxq ,i = 1,...,N, B £ Ji mx i are given matrices, and mq < N. The above system 
can be written as a linear system in vector form, by using 

A = [vec(Ai), v ec(-Ajv)], b = vec(B), 

where vec(Ai) is a vector obtained by stacking the columns of Aj on top of another. Then A is 
an (mq) x N matrix. We also assume that A is normalized in the sense that \\v ec(Ai)\\2 = 1 for 
i = 1, . . . , N. To comply with the matrix form, we define the Gram matrix of the linear operator 
A as 



G(A) 

and the mutual coherence as 



tr^Ai) ••• tv(AjA N ) 

tT(AjfAi) ••• tr(A%A N ) 
^(AfA 3 )\ 



where 



fi(A) = max i , ii • 

\\Ai\\F • \\Aj\\F 

\f denotes the Frobenius norm. Similarly, we define 

MAfAj)] _ MAfAj) 



/t/ (2) (A) = max ■ 



*¥=j [ \\A\\f ■ \\Aj\\F \\A\\f ■ \\Aj\\F 

as the second largest coherence, and a(A) is the maximum number of absolute entries equal to 
fi(A) in a row of G(A). Then the results in previous sections can be easily transformed to the 
sparsest solution of a linear system in matrix form. 



20 



5 A further improvement via support overlap 



Many uniqueness conditions for sparsest solutions of a linear system were derived from Theorem 
1.1 by using the lower bound of Spark(^4). In this section, we point out that Theorem 1.1 itself 
might be improved in some situations by the support overlap of solutions of a linear system, 
leading to an enhanced spark-type uniqueness condition. We use Supp(x) to denote the support 
of x, i.e., Supp(x) = {i : Xi 0}. 

Definition 5.1. The support overlap S* of the solution of Ax = b is the index set 

s*= n sup P (x), 

xey 

where y = {x : Ax = b}, the solution set of the linear system. 

Clearly, S* might be empty if there is no common index for the support of solutions. However, 
when some columns of A are crucial, and they must be used for the representation of b, the support 
overlap S* is nonempty for these cases. 

Theorem 5.2. Let S* be the support overlap of the solution of the system Ax = b. If the 
system has a solution x obeying 

\\x\\ <^(\S*\ + Spark(A)), (30) 

then x is the unique sparsest solution of the linear system. 

Proof. Let x be a solution of the system Ax = b obeying (30). We now prove that it is the 
unique sparsest solution of the linear system. We assume the contrary that the linear system 
has a solution y / x with ||y||o < ||x||o- Since A{y — x) = 0, which implies that the columns 
ai,i G Supp(y — x) of A are linearly dependent, we have 

\\y ~ x h = l Su PP(y ~ x )\ ^ Spark(^). (31) 

Note that for any u,v G R n , the value of ||diag(-u)t> ||o is the number of z's such that UiVi / 0. So 
it is easy to see that 

S* = min {||diag(x)u|| : x, u G y} . 
Thus, for any u, v G R n , we have 

\\u - v\\ < \\u\\ + \\v\\ - ||diag(«)f ||o, 

and hence 

||y-z|| < IMIo+ IMIo ~ ||diag(a7)2/|| 

< 2||x|| - ||diag(x)y|| 

< 2||x|| -|5*|, (32) 

where the first inequality follows from ||y||o < ll^llo an d the second inequality follows from the fact 
||diag(x)y|| > \S*\ for any x,y G y. It follows from (31) and (32) that 2||x|| - \S*\ > Spark(^l), 
which contradicts with (30). Thus x is the unique sparsest solution of the linear system. □ 
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As a result, all previous mutual-coherence-type uniqueness criteria for sparsest solutions of 
a linear system can be further improved when the value of \S*\ or its lower bound is available. 
Taking Theorem 2.10 (iii) as an example, we have the following result. 

Corollary 5.3. Let A € R mxn ^ where m < n, be a matrix with n(A) < 1 and a(A) = 1. 
Suppose that \S*\ > 7* where 7* is known. Then if the system Ax = b has a solution x obeying 



kilo < 



7* + 1 + 



1 



u(A) 



+ 



1 



1 



^){A) f,(A) 



(1 - n(A)) 



(33) 



x is the unique sparsest solution of the linear system. 



When the support overlap S* is nonempty, we have \S*\ > 1. All the aforementioned mutual 
coherence type bounds for uniqueness of sparsest solutions can be further improved by at least 
0.5. Such an improvement can be crucial, as shown by the next example. 

Example 5.4. Consider the system Ax = b where 





' -1 -4 


2 


4 " 




2 


A = 


-1 -1 


1 


2 


, b = 


1/2 




0-1 










.V2_ 



Clearly, the last two columns are linearly dependent. So Spark(yl) = 2, and Theorem 1.1 cannot 
confirm the uniqueness of any sparsest solution. However, note that the third column of A is vital 
and must be used to represent b. This means that x% 7^ for any solution of the linear system. 
So, \S*\ > 1 = 7*. Note that the solution x* = (0,0, 1/2,0,0) T satisfies that 

||x|| = 1 < 1.5 = (7* + Spark(^))/2 < + Spark(A))/2. 

By Theorem 5.2, x* is the unique sparsest solution of the linear system. This example shows that 
by incorporating the support overlap S* , the result of Theorem 1.1 can be remarkably improved 
when S* / 0. 



6 Uniqueness via range property of A T 

The exact recovery of all /c-sparse vectors in R n by a single matrix A is called the uniform 
recovery. To uniformly recover sparse vectors, some matrix properties should be imposed on A. 
The restricted isometry property (RIP) [11] and null space property [13, 35] are two well-known 
conditions for the uniform recovery. Recently, the so-called range space property (RSP) of order 
k was proposed in [36], which can also characterize the uniform recovery. All uniform recovering 
conditions imply that the linear system Ax = y := Ax° has a unique sparsest solution. In fact, 
these conditions have more capability than just ensuring the uniqueness of sparsest solutions of 
a linear system. For instance, they also guarantee that a linear system has a unique least i\- 
norm solution, leading to the equivalence between £q- and .^-minimization problems, which is 
fundamental for the development of compressed sensing theory. In this section, we briefly discuss 
and develop certain more relaxed range properties of A T that guarantee the uniqueness of sparsest 
solutions. Our first range property is defined as follows, which was first introduced in [37] for a 
theoretical analysis of reweighted £i-methods for the sparse solution of a linear system. 
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Definition 6.1 (Range Property (I)). Let A be a full-rank m x n matrix with m < n. Let B 
be an (n — m) x n matrix consisting of the basis of the null space of A. B T is said to satisfy a 
range space property (RSP) of order k with a constant p > if 

Uj\\l<pUj\\l 

for all £ G 1Z(B T ), the range space of B T , where J C {l,...,n} with \ J\ = k is the indices of k 
smallest absolute components of £, and J = {1, ...,n}\J. 

Based on the above definition, we have the next result. 

Theorem 6.2. Let A G R mxn and B G R(n-™)xn ^ y u //_ ron /j matrices satisfying AB T = 0, 
where m < n. Suppose that B has a RSP of order (n — k). Then the solution x of the system 
Ax = b obeying \\x\\o < k/2 is the unique sparsest solution of the linear system. 

Proof. First, under the condition of the theorem, we have the following statement (see e.g., 
Proposition 3.6 in [37]): B T has the RSP of order (n — k) with a constant p > if and only if 
A has the NSP of order k with the same constant r = p. Therefore, by the definition of NSP of 
order k, we have ||?7a||i < t II 7 ?aIIi f° r an V £ AA(^4) an d all A C {1,2, ...,n} with |A| < k, where 
A = {i : i ^ A}. This implies that the solution x with ||x||o < k/2 must be unique. In fact, 
we note that two (/c/2)-sparse solutions x and y satisfy A(x — y) = 0, i.e., x — y G N(A). Let 
A = Supp(x — y). Since x — y is at most /c-sparse, we have |A| < k. By the NSP of order k, we 
have 

\\ x ~ 2/11 1 = \\(x- y)\\\i < t\\(x - y) x ||i = 0, 

which implies that x = y. Thus the (/c/2)-sparse solution is the uniqueness sparsest solution of 
the linear system. □ 

The above theorem impose range property on the basis of the null space of A, instead of on 
A itself. We now impose a range property on A directly. 

Definition 6.3 (Range Property (II)). There exists an integer k such that for any disjoint 
subsets Ai,A2 o/{l,...,n} with \A\\ + IA2I = k and IA2I < 1, the range space 1Z(A T ) contains a 
vector 7] satisfying 7^ = 1 for all i G Ai, rji = —1 for all i G A2, and \rji\ < 1 for i ^ Ai 

The above definition is a relaxed version of the range property introduced in [36]. Under the 
above range property (II), we can prove the following result. 

Theorem 6.4. Suppose that A G R mxn with m < n satisfies the range property (II). Then 
if the system Ax = b has a solution obeying ||x||o < fc/2, x is the unique sparsest solution of the 
linear system. 

Proof. Under the range property (II), we first prove that any k columns of A are linearly 
independent. In fact, let A = {71,..., 7fc} be an arbitrary subset of {l,...,n} with |A| = k. 
We now prove that the columns of A^ are linearly independent. It is sufficient to show that 
za = is the only solution to the system A\z/\_ = 0. In fact, let us assume A\z/\_ = 0. Then 
z = (z\, z-fc = 0) G R n is in M{A). Consider the disjoint sets Ai = A, and A2 = 0. By the range 
property (II), there exists a vector r\ G IZ^A ) with 7^ = 1 for all i G Ai = A. By the orthogonality 
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olAf(A) and K(A T ), we have 



= z T r] = z{n k + zjr/j = zln A , 

which is nothing but 

z 7l + z 72 H 1- z 7fe = 0. (34) 

Now we consider an arbitrary pair of disjoint sets: 

Ai = A\{ 7i }, A 2 = {7;}, 

which satisfy that |Ai| + | A.2 1 = k and | A.2 1 < 1. By the range property (II), there exists an 
77 6 7Z(A T ) with 7] yj = 1 for every j 7^ i and r\ 1% = —1. Again, it follows from z T n = that 

(Z 71 + • • • + Z li _ 1 + Zy i + 1 • • • + Zy k ) — Z 7i = 0, 

which holds for every i with 1 < i < k. Combining these relations and (34) implies that z 7i = 
for all i = l, k, i.e., za = 0. So any k columns of A are linearly independent. This implies that 
k < Spark(^4). The desired result follows immediately from Theorem 1.1. □. 

7 Conclusions 

Through such concepts as sub-mutual coherence, scaled mutual coherence, coherence rank, and 
sub-Babel function, we have developed several new and improved sufficient conditions for a linear 
system to have a unique sparsest solution. The key result established in this paper claims that 
when the coherence rank of a matrix is low, the mutual-coherence-based lower bound for the spark 
of a matrix can be improved. We have also demonstrated that the scaled mutual coherence, which 
yields a unified uniqueness claim, may further improve the unsealed coherence-based uniqueness 
conditions if a suitable scaling matrix is used. The scaled mutual coherence enables us to integrate 
the right-hand-side vector b of a linear system, and the orthogonal matrix out of the singular 
value decomposition of A into a uniqueness criterion for the sparsest solution of a linear system. 
Moreover, the support overlap of solutions and certain range property of a matrix also play an 
important role in the uniqueness of sparsest solutions. 
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